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Abstract 

This paper introduces the generalized forward-backward splitting algorithm for 
minimizing convex functions of the form F + £™ =1 Gj, where F has a Lipschitz- 
continuous gradient and the Gi's are simple in the sense that their Moreau proxim- 
ity operators are easy to compute. While the forward-backward algorithm cannot 
deal with more than n = 1 non-smooth function, our method generalizes it to the 
case of arbitrary n. Our method makes an explicit use of the regularity of F in the 
forward step, and the proximity operators of the G^'s are applied in parallel in the 
backward step. This allows the generalized forward-backward to efficiently address 
an important class of convex problems. We prove its convergence in infinite dimen- 
sion, and its robustness to errors on the computation of the proximity operators 
and of the gradient of F. Examples on inverse problems in imaging demonstrate 
the advantage of the proposed methods in comparison to other splitting algorithms. 



1 Introduction 

Throughout this paper, 7i denotes a real Hilbert space endowed with scalar product 
(• | •} and associated norm || ■ ||, and n is a positive integer. We consider the following 
minimization problem 

n 

mm{*(z) d = f F(x) + (1) 

where all considered functions belong to the class To(%) of lower semicontinuous, proper 
(its domain is non-empty) and convex functions from % to ]-oo,+oo]. 

1.1 State-of-the-Art in Splitting Methods 

The decomposition ([I]) is fairly general, and a wide range of iterative algorithms 
takes advantage of the specific properties of the functions in the summand. One crucial 
property is the possibility to compute the associated proximity operators [S3] , defined as 

prox G (a;) = f argmini||2;-?/|| 2 + G{y). (2) 



This is in itself a convex optimization problem, which can be solved efficiently for many 
functions, e.g. when the solution, unique by strong convexity, can be written in closed 
form. Such functions are referred to as "simple". 

Another important feature is the differentiability of the functional to be minimized. 
However, gradient-descent approaches do not apply as soon as one of the functions Gi is 
non-smooth. For n=\ and G\ simple, the forward-backward algorithm circumvents this 
difficulty if F is differentiate with a Lipschitz-continuous gradient. This scheme consists 
in performing alternatively a gradient-descent (corresponding to an explicit step on the 
function F) followed by a proximal step (corresponding to an implicit step on the function 
G\). Such a scheme can be understood as a generalization of the projected gradient 
method. This algorithm has been well studied |50| HQl [671 EH EE EH Ej- Accelerated 
multistep versions have been proposed |55[ [70] [6], that enjoy a faster convergence rate 
of 0(l/t 2 ) on the objective <f. 

Other splitting methods do not require any smoothness on some part of the composite 
functional \F The Douglas- Rachford |27j and Peaceman-Rachford |57| schemes were 
developed to minimize G\{x) + G2{x), provided that G\ and G2 are simple |4"? ] |4"5 | |33" 1 ITT] 
and rely only on the use of proximity operators. The backward-backward algorithm |46| 
l56l UJ [5j [T7] can be used to minimize ^(x) = G±(x) + G%(x) when the functions involved 
are the indicator functions of non-empty closed convex sets, or involve Moreau envelopes. 
Interestingly, if one of the functions G\ or G% is a Moreau envelope and the other is 
simple, the forward-backward algorithm amounts to a backward-backward scheme. 

If L is a bounded injective linear operator, it is possible to minimize ^(x) = G\ o 
L(x) + G2{x) by applying these splitting schemes on the Fenchel-Rockafellar dual prob- 
lem. It was shown that applying the Douglas-Rachford scheme leads to the alternat- 
ing direction method of multipliers (ADMM) |39[ HUJ [5T| U21 [33] . For non-necessarily 
injective L and G2 strongly convex with a Lipschitz-continuous gradient, the forward- 
backward algorithm can be applied to the Fenchel-Rockafellar dual [36, 19J. Dealing 
with an arbitrary bounded linear operator L can be achieved using primal-dual methods 
motivated by the classical Kuhn- Tucker theory. Starting from methods to solve saddle 
function problems such as the Arrow-Hurwicz method [2j and its modification |60| . the 
extragradient method |44| . this problem has received a lot of attention more recently 

pa esi ES Eg Hang. 

It is also possible to extend the Douglas-Rachford algorithm to an arbitrary number 
n > 2 of simple functions. Inspired by the method of partial inverses |65[ Section 5], most 
methods rely either explicitly or implicitly on introducing auxiliary variables and bringing 
back the original problem to the case n - 2 in the product space T~L n . Doing so yields 
iterative schemes in which one performs independent parallel proximal steps on each of 
the simple functions and then computes the next iterate by essentially averaging the 
results. Variants have been proposed in |2T] and |34| . who describe a general projective 
framework that does not reduce the problem to the case n = 2. Note however that these 
extensions do not apply to the forward-backward scheme that can only handle n = 1. It 
is at the heart of this paper to present such an extension. 

Recently proposed methods extend existing splitting schemes to handle the sum of 
any number of n > 2 composite functions of the form Gi = Hi o Li, where the H^s are 
simple and the Lj's are bounded linear operators. Let us denote Li* the adjoint operator 
of Li. If Li satisfies LiL* = vl& for any v > (it is a so-called tight frame), Hi o L, is 
simple as soon as Hi is simple and Lj* is easy to compute |20| . This case thus reduces to 
the previously reviewed ones. If Li is not a tight frame but (Id+Lj*Lj) or (Id+LjLj*) 
is easily invertible, it is again possible to reduce the problem to the previous cases by 
introducing as many auxiliary variables as the number of Li's each belonging to the range 
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of Li. Note however that, if solved with the Douglas- Rachford algorithm on the product 
space, the auxiliary variables are also duplicated, which would increase significantly the 
dimensionality of the problem. Some dedicated parallel implementations were specifically 
designed for the case where (J^ Li* Li) or LiLi*) is (easily) invertible, see for instance 
|32[I58|. If the Lj's satisfy none of the above properties, it is still possible to call on primal- 
dual methods, either by writing = Ya=i Hi(Lix) = G{Lx) with L{x) = (Li(x)) i and 
G (( x i)i) = ZiHi(xi), see for instance |29j; or = ts((xj)J + E^i^L^;) [9], 

where S is the closed convex set defined in Section [3.21 

In spite of the wide range of already existing proximal splitting methods, none seems 
satisfying to address explicitly the case where n > 1 and F is smooth but not necessarily 
simple. A workaround that has been proposed previously used nested algorithms to com- 
pute the proximity operator of Y,i G% within sub-iterations, see for instance |30[ 114]; this 
leads to practical as well as theoretical difficulties to select the number of sub-iterations. 
More recently, [53] proposed an algorithm for minimizing ^(x) = F(x) + G{x) under 
linear constraints. We show in Section [5] how this can be adapted to adress the general 
problem ([I]) while achieving full splitting of the proximity operators of the Gi 's and using 
the gradient of F. It suffers however from limitations, in particular the introduction of 
many auxiliary variables and the fact that the gradient descent can't be directly applied 
to the minimizer; see Section [5] and [6] for details. The generalized forward-backward 
algorithm introduced in this paper is intended to avoid all those shortcomings. 

As this paper was being finalized, the authors in [23] independently developed a 
primal-dual algorithm to solve a class of problems that cover those we consider here. 
Their approach and algorithm are however very different from ours in many important 
ways. We will provide a detailed comparison with this work in Section [5] and will also 
show on numerical experiments in Section [6] that our algorithm seems more adapted for 
problems of the form ([I]). 

1.2 Applications in Image Processing 

Many imaging applications require solving ill-posed inverse problems to recover high 
quality images from low-dimensional and noisy observations. These challenging problems 
necessitate the use of regularization through prior models to capture the geometry of 
natural signals, images or videos. The resolution of the inverse problem can be achieved 
by minimizing objective functionals, with respect to a high-dimensional variable, that 
takes into account both a fidelity term to the observations and regularization terms 
reflecting the priors. Clearly, such functionals are composite by construction. Section [6] 
details several examples of such inverse problems. 

In many situations, this leads to the optimization of a convex functional that can 
be split into the sum of convex smooth and non-smooth terms. The smooth part of the 
objective is in some cases a data fidelity term and reflects some specific knowledge about 
the forward model, i.e. the noise and the measurement /degradation operator. This is for 
instance the case if the operator is linear and the noise is additive Gaussian, in which case 
the data fidelity is a quadratic function. The most successful regularizations that have 
been advocated are non-smooth, which typically allow to preserve sharp and intricate 
structures in the recovered data. Among such priors, sparsity-promoting ones have 
become popular, e.g. the £i-norm of coefficients in a wisely chosen dictionary |49] . or total 
variation (TV) prior |63] , To better model the data, composite priors can be constructed 
by summing several suitable regularizations, see for instance the morphological diversity 
framework [66]. The proximity operator of the £i-norm penalization is a simple soft- 
thresholding [26], whereas the use of complex or mixed regularization priors justifies the 
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splitting of non-smooth terms in several simpler functions (see Section [6] for concrete 
examples). 

The composite structure of convex optimization problems raising when solving inverse 
problems in the form of a sum of simple and/or smooth functions involving linear opera- 
tors explains the popularity of proximal splitting schemes in imaging science. Depending 
on the structure of the objective functional as detailed in the previous section, one can 
resort to the appropriate splitting algorithm. For instance, the forward-backward algo- 
rithm and its modifications has become popular for sparse regularization with a smooth 
data fidelity, see for instance [38l ESJ El EE Q3J El [8]. The Douglas-Rachford and its 
parallelized extensions were also used in a variety of inverse problems implying only non- 
smooth functions, see for instance (20j ED [30l [101 EE1 EU [6TJ. The ADMM (which 
is nothing but Douglas-Rachford on the dual) was also applied to some linear inverse 
problems in |48[ 157] . Primal-dual schemes 1 12^ [29] are among the most flexible schemes 
to handle more complicated priors. The interested reader may refer to |66[ Chapter 7] 
and [22] for extensive reviews. 

1.3 Contributions and Paper Organization 

This paper introduces a novel generalized forward-backward algorithm to solve ([I]) 
when F is convex with a Lipschitz continuous gradient, and the Gj's are convex and 
simple. The algorithm achieves full splitting where all operators are used separately: an 
explicit step for VF (single- valued) and a parallelized implicit step through the proximity 
operators of the GVs. We prove convergence of the algorithm as well as its robustness to 
errors that may contaminate the iterations. To the best of our knowledge, it is among 
the first algorithms to tackle the case where n > 1 and F is smooth (see Section [5] for 
relation to a recent work developed in parallel to ours). Although our numerical results 
are reported only on imaging applications, the algorithm may prove useful for many 
other applications such as machine learning or statistical estimation. 

Section [2]presents the algorithm and state our main theoretical result. Section [3] that 
can be skipped by experienced readers, sets some necessary material from the framework 
of monotone operator theory. Section [4] reformulates the generalized forward-backward 
algorithm for finding the zeros of the sum of maximal monotone operators, and proves 
its convergence and its robustness. Special instances of the algorithm, its potential 
extensions and discussion of its relation to two alternatives in the literature are given in 
Section [5] Numerical examples are reported in Section [6] to show the usefulness of this 
approach for applications to imaging problems. 

2 Generalized Forward-Backward Algorithm 
for Minimization Problems 

We consider problem ([TJ where all functions are in Fo(H), F is differentiable on H 
with l//3-Lipschitz gradient where /3 e]0, +00 [, and for all i, Gi is simple. We also assume 
the following: 

(HI) The set of minimizers of ([I]) argmin('I') is non-empty; 
(H2) The domain qualification condition holds, i.e. 

(0,...,0) 6 sri{(x - yi, . . . ,x - y n ) \x£U and Vi, y-i e dom(Gj)} , 
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where dom(Gj) d = f {x e H\Gi(x) < +00} and sri(C) is the strong relative interior of a non- 
empty convex subset C of U 0]. Under pH)|(H2)| it follows from |62j H Theorem 16.2 
and Theorem 16.37(i)] that 

* argmin(^) = zer(<9^) = zer (VF + EjdGi) , 

where <9Gj denotes the subdifferential of Gi and zer(^4) is the set of zeros of a set-valued 
map A (see Definition 3.1 in Section 3.1). Therefore, solving ([I]) is equivalent to 



Find x 6 H such that e VF(x) + ^dG^O) 



(3) 



The generalized forward-backward we propose to minimize ([I]) (or equivalently to 
solve ([3])) is detailed in Algorithm [T] 



Algorithm 1 Generalized Forward- Backward Algorithm for solving ( 1|. 
(3' 1 e]0, +00 [ is the Lipschitz constant of VF; I\ is defined in Theorem 



2.1 



Reauire <*W.»] £ * ]0, if B.t. £? =1 u, = 1, 

M 7t e ]0, 2/3 [ Vt g IN, A t e/ A Vt6M. 
Initialization 

t <- 0. 

Main iteration 
repeat 

for i e [1, nj do 

2i <- Zj + At( proxTt G . (2x - Zj - 7tVF(x)) - 2); 

_ w i 1 

X <- Zi^iZf, 

t<r-t+l. 

until convergence ; 
Return x. 



To state our main theorem that ensures the convergence of the algorithm and its 
robustness, for each i let £i,t,i be the error at iteration t when computing prox7t G . at 
its argument, and let £2,* be the error at iteration t when applying VF to its argument. 
Algorithm [T] generates sequences (z{ t t) te ^, i e [l> n J an d ( x t)teKl' sucn that for all i and 
t, 

Zi,t+i = Zi,t + A t ( proxTt G (2x t - z ijt - -ft (VF(x t ) + e 2 ,t) ) + £\,t,i ~ x t ) . (4) 

The following theorem introduces two different sets of assumptions to guarantee 
convergence. Assumption |(A1)| allows one to use a greater range for the relaxation 
parameters At, while assumptions |(A2")| enables varying gradient-descent step-size 74 and 
ensures strong convergence in the uniformly convex case. Recall that a function F e 
Tq{%) is uniformly convex if there exists a non-decreasing function (p : [0, +oo[-> [0, +00] 
that vanishes only at 0, such that for all x and y in dom(F), the following holds 

V p e]0, 1[, F(px + (1 - p)y) + p(l - p)<p(\\x - y\\) < pF(x) + (1 - p)F(y). (5) 
Theorem 2.1. Set lim7t = 7 and define the following assumptions: 
(AO) (i) 0<hmA t <IWA i <min(§,±±^2); 

(ii) EtJo < +0°, and /or a// i, lki,*,tll < +0 °- 
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(Al) (i) Vi,7 t = 7e]0,2/3[; 

(ii) / A = ]0,mi I1 (^)[. 

(A2) (i) < lim7i < 7 < 2/3; 

(ii) J A =]0,1]. 



Suppose that (HI), (H2) anc? |(AO)] are satisfied. Then, if either \(Al)\ or |(A2)| is satisfied, 



( x t)tem defined in Q converges weakly towards a minimizer of ([T|. Moreover, i/ |(A2)| 
is satisfied and F is uniformly convex, the convergence is strong to the unique global 
minimizer of ([I]). 

This theorem will be proved after casting it in the more general framework of mono- 
tone operator splitting in Section HI 



Remark 2.1. The sufficient condition of strong convergence in Theorem 2.1 can be weak- 
ened, and other ones can be stated as well. Indeed, the generalized forward-backward 
algorithm has a structure that bears similarities with the classical forward-backward, 
since it consists of an explicit forward step, followed by an implicit step where the prox- 
imity operators are computed in parallel. In fact, it turns out that the backward step 
involves a firmly non-expansive operator (see next section), and therefore statements 
of |24[ Theorem 3.4(iv) and Proposition 3.6] can be transposed with some care to our 
algorithm. 

The formulation of Algorithm [T] is general, but it can be simplified for practical 
purposes. In particular, the auxiliary variables z% can all be initialized to 0, the weights 
Ui set equally to 1/n, and for simplicity the relaxation parameters Aj and the gradient- 
descent step-size 74 can be set constant along iterations. This is typically what has been 
done in the numerical experiments. 



3 Monotone Operators and Inclusions 

The subdifferential of a function in Tq(T-L) is the best- known example of maximal 
monotone operator. Therefore, it is natural to extend the generalized forward-backward, 
Algorithm [T] to find the zeros of the sum of maximal monotone operators, i.e. solve 
the monotone inclusion ([3j when the subdifferential is replaced by any maximal mono- 
tone operator. This is the goal pursued in Section [4] where we provide the proof of a 
general convergence and robustness theorem whose byproduct is a convergence proof of 
Theorem 12.11 

We first begin by recalling some essential definitions and properties of monotone 
operators that are necessary to our exposition. The interested reader may refer to |59[ |4"] 
for a comprehensive treatment. 

3.1 Definitions and Properties 

In the following, A : H ->■ 2^ is & set- valued operator, and Id is the identity operator 
on T~L. A\s single- valued if the cardinality of Ax is at most 1. 

Definition 3.1 (Graph, inverse, domain, range and zeros). The graph of A is the 
set gra(vl) d = f 6 T~L 2 | y e Ax}. The inverse of A is the operator whose graph is 

gra(A _1 ) = f {(x, y) e % 2 | (y, x) e gra(A)}. The domain of A is dom(^4) = f {x e % \ Ax + 0} 

The range of A is ran(j4) d = f {y e % \ 3x e T~L ■ y e Ax}, and its zeros set is zei(A) d = 
{x£n\0£ Ax} = A- l (0). 
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Definition 3.2 (Resolvant and reflection operators). The resolvant of A is the operator 
J A = ( Id +A) ■ The reflection operator associated to J a is the operator Ra = 2 J a - Id. 

Definition 3.3 (Maximal monotone operator). A is monotone if 

V x, y e T~L, u e and 1; e Ay =>(u-v\x-y)>0 . 

It is moreover maximal if its graph is not strictly contained in the graph of any other 
monotone operator. 

Definition 3.4 (Non-expansive and a-averaged operators). A is non-expansive if 

V x, y e T~L, u e Ax and v e => ||u - v\\ < \\x - y\\ . 

For a e]0, 1[, A is a-averaged if there exists non-expansive such that A = (1-a) Id +aR. 
We denote .A(a) the class of a-averaged operators on?i. In particular, is the class 

of firmly non- expansive operators. 

Note that non-expansive operators are necessarily single-valued and 1-Lipschitz con- 
tinuous, and so are a-averaged operators since they are also non-expansive. The following 
lemma gives some useful characterizations of firmly non-expansive operators. 

Lemma 3.1. Let A : dom(^4) = H -*■ T~L. The following statements are equivalent: 

(i) A is firmly non-expansive; 

(ii) 2A - Id is non-expansive; 

(hi) Vx,j/e^, \\Ax - Ay\\ 2 < (Ax - Ay\x- y); 

(iv) A is the resolvent of a maximal monotone operator A' , i.e. A - J a 1 ■ 



Proof. 



At -4 (!) <=> A - ld 2 R for some R non-expansive, (i) <=> (iii) see [72] , 
see~|5l]. " " □ 



We now summarize some properties of the subdifferential that will be useful in the 
sequel. 

Lemma 3.2. Let F : % -*■ H be a convex differentiate function, with 1/ (3-Lipschitz 
continuous gradient, (3 e]0, +oo[ ; and let G ■ T~L ->•]— 00, +00] be a function in Tq(T-L). 
Then, 

(i) /3Vi ? e^.(^), i.e. is firmly non- expansive; 

(ii) dG is maximal monotone; 

(iii) The resolvent of dG is the proximity operator of G, i.e. prox^ = Jog- 



Proof, (i) This is Baillon-Haddad theorem |3J. (ii) See |62| . (iii) See |54j . □ 



We thus consider in the following n maximal monotone operators 
Ai.TL^-2^ indexed by i e [l,w], and a (single- valued) operator B : % -* H and 
/3 e]0, +oo[ such that j3B e ^4(^)- Therefore, solving Q can be translated in the more 
general language of maximal monotone operators as solving the monotone inclusion 

Find x € V. such that e Bx + ^ AiX, (6) 

i 

where it is assumed that zer {B + Y,i Ai) t 0. 
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3.2 Product Space 

The previous definitions being valid for any real Hilbert space, they also apply to 
the product space T~L n endowed with scalar product and norm derived from the ones 
associated to %. 

Let (^i)je[i n ] e ]0> 1[™ such that Y,f=i w « = We consider H d = T~L n endowed with the 
scalar product (-|-), defined as 

n 

M x = (x i ) i ,y = (y i ) i e 71, (x\y) = Y / ^i(xi\yi} 

i=l 

and with the corresponding norm || • ||. S c *H denotes the closed convex set defined 
by S d = f {x = {xi) i & H\ x\ = X2 = ■■■ = x n }, whose orthogonal complement is the closed 
linear subspace S l = {x = {xi) i tH\Y.i ^i x i = 0}- We denote by Id the identity operator 
on *H, and we define the canonical isometry 

C : % -*■ S, x *-*■ (x, . . . , x) . 

is ■'H, -*•]— oo,+oo] and Ns ■ H -* 2^ are respectively the indicator function and the 
normal cone of S, that is 



. ,def JO if x e«S, dcf \S l if x 65, 

is{x) = \ . and N s (x) = i . 

+ oo otherwise , otherwise . 



Since S is non-empty closed and convex, it is straightforward to see that Ng is maximal 
monotone. To lighten the notation in the sequel, we introduce the following concatenated 
operators. For every i 6 [l,nj, let Ai and B as defined in ([6|. For 7 = (7i)j 6 n n ] 6 
]0,+oo[™, we define fA : H -*■ 2^,x = (x{)i >->■ Xf = i jiAi(xi), i.e. its graph is 

n 

gra(7A) d = Xgra(7i^i) 
i=l 

= {(x,y) 6 % 2 I x = (xi)i,y = (yi)i, and Vi, y« 6 ^AiXi] , 

and B .H ^H,x = (xi)i {Bxi)i. 

Using the maximal monotonicity of , . . . , ^4 ra and S it is an easy exercise to establish 
that ~fA and B are maximal monotone on H. 

4 Generalized Forward-Backward Algorithm 
for Monotone Inclusions 

Now that we have set all necessary material, we are ready to solve the monotone 
inclusion pj. First, we derive an equivalent fixed point equation satisfied by any solution 
of (§. Fr om this, we draw an algorithmic scheme and prove its convergence towards a 



solution, as well as its robustness to errors. Finally, we derive the proof of Theorem 2.1 



4.1 Fixed Point Equation 

From now on, we denote the set of fixed points of an operator T : H -*■ H by 
FixT d = {z£-H\Tz = z}. 
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Proposition 4.1. Let (wi)ie[in] e ]0>l[ n - F° r an V 7 > 0, x e % is a solution of Q if 
and only if there exists (^i)i£[i,n] € % n such that 



V i, Zi = Rj_ A (2x - Zi - 7-Bx) - jBct 



(7) 



Proof, set 7 > 0, we have the equivalence 



e Bx + Y, A 



V i, (x - - 7-Bx) e 7^4jX 



Now, 



(x - Zi - jBx) 6 7^x 
(by Lemma 3.1||(iv)[ ) 



7 

(2x - Zi — jBx) - x € — AjX 
Wi 

x = Jj_ j4 .(2x - - 7-Bx) 

2x - (2x - Zi) = 2Jsl a (2x -Zi- jBx) 

- (2x - Zi - ^Bx) - 7-Bx 
Zi = Rj_ A .(2x - z-i- jBx) - jBx . 



□ 



Before formulating a fixed point equation, consider the following preparatory lemma. 
Lemma 4.1. For all z = (zi)i e H, b = (b) i e S, and 7 = (7«)j e ]0, +oo[ n , 

(i) Jn s is the orthogonal projector on S, and Jn s z = C {Hi^i z i)j 

(ii) R Ns (z-b) = R Ns z - b; 

(iii) R lA z = (R^a^z,)).. 

Proof. 



i) From Lemma 3.2 (iii) we have for z e H 



J Ns (z) = argmin ye<s \z - y\\ = proj,s(z) . 

Now, argmin ye<s \\z - y\\ 2 = C (argmin, ye -^ Ei^ilki _ 2/|| 2 )j where the unique minimizer of 
£i&>i|#j ~y\\ 2 is t ne barycenter of (zi) i , i.e. Y.i ljJ i z i- 



ii) Jn s is obviously linear, and so is Rn s - Since b e S, RN s b = & an d t ne result 



follows. 
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This is a consequence of the separability of 7A in terms of the components of z 
implying that J jA z = (J^AiZ^i- The result follows from the definition of i? 7 A- D 

Proposition 4.2. (#i)ie[l,n] e % n satisfies ([7]) i/ anc? on/?/ if z - {zi) i is a fixed point of 
the following operator 



n 

z 



n 



l[Ry A R Ns +Id][ld -~fBJ Ns ](z) , 



(8) 



withy 



9 



Proof. Using Lemma 4.1 in Q, we have C(x) = Jn s z, C{Bx) = BJ^ s {z) and Rn s - 



^BJn s = i?jv s [Id - ^BJn s ]- Altogether, this yields, 

z satisfies Q -o- z = R-^aRns [id - T-BJyvs] 2 - jBJn s z 

o 2z = R^ A R Ns [ld - ^BJ Ns ]z + [Id - 7BJ A r s ]z 

o z=- [R.yARN s + Id] [id - jB J Ns ]z 



□ 



4.2 Algorithmic Scheme and Convergence 

The expression ^ gives us the operator on which is based the generalized forward- 
backward. We first study the properties of this operator before establishing convergence 
and robustness results of our algorithm derived from the Krasnoselskij-Mann scheme 
associated to it. 

Proposition 4.3. For all 7 e ]0, +00 [ n , define 

Tl ' 7 1 z \ [R jA Rn s + Id] x . (9) 

Then, Ti j7 is firmly non- expansive, i.e. Ti )7 



Proof. From Lemma 3.1 R^ i A i an d -Rjv s are non-expansive. In view of Lemma [4.1||(iii' 



R^A is non-expansive as well. Finally, as a composition of non-expansive operators, 
RjaRns i s a l so non-expansive, and the proof is complete by the definition of A (^). □ 

Proposition 4.4. For all 7 e]0,2/3[, define 

T 2,7 : H z ^2 [Id- 7 BJ Ns ] Z . (10) 

Then, T 2j7 a(^). 

Proof. By hypothesis, /3.B e .4, (|) and so is j3B. Then, we have for any x,y 

\\/3BJ Ns x- /3BJ Ns yf < (f3BJ Ns x-j3BJ Ns y\J Ns x-J Ns y) 

= (pj Ns BJ Ns x-j3J Ns BJ Ns y\x-y) 
= (PBJ Ns x-pBJ Ns y\x-y) , (11) 



where we derive the first equality from the fact that Jjv s is self-adjoint (Lemma 



Lemma 



3.1 


(iii) 




(i) 



s 



4.1 



we establish that (3BJn s € A{^. We conclude using |17| Lemma 
2.3]. ~ □ 

Proposition 4.5. For all 7 e ]0,+oo[ n and 7 e ]0,2/3[, Ti )7 T 2j7 e -4(a), with a = 
max (l' 1+2/3/7)- 



Proof. As Ti j7 and T 2j7 are a-averaged operators by Proposition 4.3 and Proposition 4.4 
it follows from |17[ Lemma 2.2 (hi)] that their composition is also a-averaged with the 
given value of a. □ 
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The following proposition defines a maximal monotone operator A 7 which will be useful 
for caracterizing fixed points of Ti j7 T 2i7 as monotone inclusions. 

Proposition 4.6. For all 7 e ]0, + oo[ ra there exists a maximal monotone operator A'^ 
such that Ti )7 = Ja' ■ Moreover for all 7 > 0, 

FixTi i7 T 2i7 = zer(A^+7 J BJ 7Vs ) . (12) 



Proof. The existence of A'^ is ensured by Proposition 4.3 and Lemma 3.1||(iv) Then for 

z = T lj7 T 2)T s o z = (ld + A^ 1 (Id - ~fB J Ns )z 
o z - jBJn s z e z + A 7 z 
-c* e A 7 z + ^BJn s z 

□ 

Now, let us examine the properties of A 7 . 
Proposition 4.7. For a// 7 e ]0, +oo[ n and u,y el-L 

ue A'^y ou s -y L e 7 A(y 5 -w 1 ) , (13) 

where we denote for y e T-L, y s ^proj^ (y) and y L ^proj^i (y). 
Proof. First of all, by definition of Ti j7 we have 

T 1>7 = § [(2 J 7A - Id) (2 Jjv s - Id) + Id] 

= \ [2 J-fA (proj,s - pro^i ) - (proj,s - proj 5 ± ) + proj^ + proj 5 i ] 

= J 7 A(proj 5 - proj 5 ± ) + proj 5 ± . (14) 

By definition we have A 7 = Ti j7 _1 - Id so that 

u e A'^y o ti + yeTi 7 _1 y 
Ti >7 (u + y)=y 

o ^((u + y^-Cw + Z/) 1 ) ^y-iu + y) 1 

<=> (u + y) 5 - (tt + y) 1 e y s - u 1 + 7A [y s - it 1 ) 

<»■ it 5 - y 1 e 7A (y 5 - w 1 ) . 

□ 

We are now ready to state our main result, establishing convergence and robustness 
of the generalized forward-backward algorithm to solve ([6]). 

Theorem 4.1. Let 

(7*)telN be a sequence in ]0, 2B[, 

(lt)teN b e a sequence in ]0, +oo[ n such that Vi, -yt = (^J.> 
(^t)teM ^ e a sequence such that Vi, Ai e /a (made explicit below), 
set Zq e anc? /or every t e IN, set 

zt+i = z t + A t (Ti )7t (T 2j7t z t + e 2 ,t) + £i,t - z t ) (15) 

where Ti )7t (resp. T 2j7ty ) is defined in Q (resp. in ([To]) J, and £\j,£2,t ^H- Set liiajt = 7 
and define the following conditions: 
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(AO) (i) zer(B + Z i A i )*0; 

(ii) 0<limA t <lWAi < min(|,i ± f^); 

(iii) < +°° ^ Eto l £ 2,t|| < +oo- 
(Al) (i) Vt,7 t = 7€]0,2/3[; 



m h = 



0,mm( |, - 



+2/3/7 1 



(A2) (i) < lim-fr < 7 < 2/3; 
(ii) / A =]0,l]. 

Suppose that |(AO)| is satisfied. Then, If either \(Al)\ or |(A2)| is satisfied, 
(i) (Ti i7t T 2j7t Zt - Zt) tgM converges strongly to 0. 



(ii) (zf)j e ]N converges weakly to a point z e F = DteiN FixT]~ t T; 



(iii) (cc t = Ei Wi^i,*)^ converges weakly to x = Ei e zer (S + Ei ^i)- 
Moreover, i/ |(A2)| is satisfied and B is uniformly monotone, then 

(iv) (xt) teIN converges strongly. 



Proof. For sequences in a Hilbert space, strong convergence is denoted by 
convergence is denoted by v . 

OTBfl S uppose first that [(AO)] and [(AT)] are satisfied. 



Under |(Al)| (i) 
we have 



and weak 



T = T\^T2^ does not depend on t (stationary operator). For all t € M, 



z t +i =z t + X t (Tz t + e t - z*) , 



4.3 



shows that T 



1,7 



withe* = Titf(T 2}i yZ t + £2,t)-Titf(T2tfZt) + £i,t- Proposition^ 

is in particular non-expansive, so that \\et\\ < ||£2,i|| + I^mIIi an d we deduce from (AO) 
that EtMDlMI < +0 °- Moreover, by Proposition |4~5| and [(Al)|f(i)j T e A{a) with 



(16) 



in 



a = max 



(f > 1+2/3/ 7 ) • m P ai 'ti cu l ar j ^ is non-expansive and thus F = FixT is closed and 



convex. Now, for i 6 IN, set Xj = Id + A* (T - Id-^), the iterations (16) can be rewritten 



Zt+l = T t z t + \ t £ t . 



(17) 



Since for all t, at = \tct < 1 by (Al) (ii) |17| Lemma 2.2 (i)] shows that T e A(at), 
and (17) is thus a particular instance of |17[ Algorithm 4.1]. Also, it is clear that 
for all t, FixTj = FixT. By Pr oposition 4.1 and Proposition 4.2 |(A0)f(i) provides 
F - ritefjFixTt + 0. According to (A0)]|(ii) limAj > and lima* < 1, so we deduce from 



|17| Theorem 3.1 and Remark 3.4] that 



£ Ttzt-zt 



< +00. 



(18) 



and that (zt) t is quasi-Fejer monotone with respect to F. By definition of T, (18) gives 



Tzt - z t 



< + 00, which in turn implies Tzt - z% — ► since lim At > 0. Then T 

being non-expansive, it follows from the demiclosed principle |11|[4"] Corollary 4.18] that 
any weak cluster point of {zt) t belongs to FixT, so that |J1 Theorem 5.5] provides weak 
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convergence towards z e F. 



Suppose now that [(AO)] and \[KTj\ are satisfied. 
Again with Pro posi tion 4,1[ Proposition |4,2| and (AO) (i)] F + 0. From Proposition 4.3 



Proposition 
tions 



4.4 



and 
and 



(A2) 

m 



Zt — >■ (establishing (i) 



Ti j7t e A (^) and T 2 ^ t e A (2/5) f° r all i. So, under assump- 
|17l Theorem 3.1 and Remark 3.4] provides that Tx~ t T2y t Zt ~ 
, that for any z e F 



(Id -T 2at )z t - (Id -T 2j7t )z 



, 



(19) 



and that (zt), is quasi-Fejer monotone with respect to F. Again, by non-expansivity F 
is closed and convex, and with [3J Theorem 5.5], {zt) t converges weakly to some point 
in F if, and only if, all of its weak cluster points lie in F. 

Let thus y be a weak cluster point of (zj) 4 . (jt)t being bounded, we can extract a 
subsequence (zf r ) T converging weakly towards y such that (jt T ) T converges strongly 



to some (0 < 700 < 2/3 by (A2) (i)). Fix then z e F and observe that (19) implies 
BJ^ s z tr — * BJm s z. 

Since f3BJ^ s e A(|), BJ^ S is continuous and monotone, hence maximal monotone 
[U Corollary 20.25]. Consequently, its graph is sequentially weakly- strongly closed [3J 

Corollary 20.33(h)]. Because BJn s is single-valued and z% T v y, we deduce BJ]\f s y = 

B,J^j s z. 

Now denote for all t, yt = f T\ nt T 2nt Zt and Ut d = f (Id - Ti )Tt ) T 2jlt Zt- It follows from (i) 



that y t -z t — >■ 0, implying y tr — - y. Then, u t = T 2at z t -T lat T 2nt z t = z t -^ t BJ Ns z v 
y t , so that u u — * -j^BJ^y. 

Moreover, u t e ((ld + A 7t ) - Id) T\ nt T 2nt z tl hence u t e A'^ t yt. Thus for all t, 



yt 

—u€ ~ftAv, and by monotonicity 

7oo 



e ~itA(yt S - i^ 1 ) by Proposition 4.7 If (v,u) e gra(7ooA) with 7,0 d = f (^)> then 
m 



yt s - W 



> 



by bilinearity and taking into account orthogonality 



( 



u t s - 

7oo 



yt s - ut 1 



•«) + {yt 1 \ut L + v) > . 



By weak convergence, (yt T ) T is bounded. Together with strong convergence of (ut T ) T 
and (7i T ) , (31 Lemma 2.36] allows to take the limit as r tends to infinity in the above 
inequality. Using BJ]y s y e «S, 

(-7oo# Jjv s y - m I y 5 - w) + (y 1 \ v) > 
{-^ooBJ Ns y - y 1 - u I y s - v) > . 

Hence maximality of TxA forces (y s ,-jooBJ^y - y 1 ) 
i.e. -700S Jjv s y - y 1 e 700A (y 5 ). Thus Proposition 



and by Proposition 4.6 y e FixTi >7oo T2. 



4.7 



6 gra(7ooA), 
provides -^BJ^y e A^y, 



:,7oo 



111 



_ In both cases, for any yeH, (y|ac t -ac) = (y | Y,iUi(zi,t - = Ei (?/ I *i,t - «i) = 
(C(y) \ zt~ z) — »■ since Zt v z, hence weak convergence of (xt)teW towards x, which 



is a zero of B + Y,i Ai by Proposition 4.1 
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(iv)| If B is uniformly monotone, then there exists a non-decreasing function ip ■ [0, +00 [- 
[0, +00] that vanishes only at 0, such that for all x,y € T~L 



(Bx -By\x-y)> <p(\\x - y\\) . 



For all t e M, 



{BJ^ s z t - BJn s z \z t - z) 



Zi^i (B (Ei<jJiZi,t) - B(Y,iUiZi) I - Zi) 
(B (Zi^iZij) - B(Y,i^iZi) I EiUi(zi,t ~ Zi)) 
V (\\Zi^i(zi,t - Zi)\\) = (p(\\xt - x\\) . 



Recall that under 



(AO) 



and 



(BJN s z t - BJn s z\ z t - z) - 
vergence of (xt) t towards x. 



(A2) BJj^ s z t — >■ BJ]\f s z and z t — - z, so that 

0. In view of the properties of tp, we obtain strong con- 

□ 



Remark 4.1. In statements (I)|(iii)| of Theorem |4 . 1 1 unde r (A0)|(Aiy (stationary case), as- 
sumptions [(AO)] can be weakened. More precisely, (A0)||(ii) can be replaced by Y,uk ^t(l _ 



aX t ) = +00 where a = max(2/3, 2/(1 + 2/3/7)), and (AO) (iii) by ZteM M||e u || + ||e 2)t ||) < 



+ 00. The proof wo uld follow the same lines as |17| Lemm a 5.1 



part of assumption (AO) (ii) on limAt is not needed under (A2) 



Let's note also that a 



Remark 4.2 (Strong Convergence). Assumption of uniform monotonicity in the proof 
of statement (iv) can be relaxed. For instance, the sequence (zt) te ™ is indeed quasi- 



Fejer monotone with respect to F. Thus, if intF t 0, strong convergence occurs by |17| 
Lemma 2.8(iv)]. 



Corollary 4.1. Theorem \2. 1\ holds. 

Proof. Let \Zi,t) te ™ an( i { x t)teN ^ e se 1 uences defined in Q. Identifying B with 
VF and A{ with dGi and skipping some calculatio ns, ((^ ,t) jX FTW follows iterations (15) 



with ex,t = and e 2 ,t = C (-7^2,4), providing (AO) 



Theorem 



4.1 



111 



in Theorem 



4.1 



Now, 



under |(H1)H(H2)1 argmin(F + J^d) = zer(VF + EidGi) * 0, providing [(Aj0WT)1 in 



The proof of weak convergence of (xt), w follows from Theorem 



4.1 



tE 



The proof of strong convergence is a consequence of Theorem 4,l||(iv)| together wit 
fact that uniform convexity of a function in Tq(H) implies uniform monotonicity of its 
subdifferential HI. □ 



5 Discussion 

5.1 Special instances 

The generalized forward-backward algorithm can be viewed as a hybrid splitting 
algorithm whose special instances turn out to be classical splitting methods; namely the 
forward-backward and Douglas- Rachford algorithms. 



Relaxed Forward- Backward For n = 1, we have Jn s = Id, A = A = Ai, B = B and 

the operator ([8j specializes to 

^[R lA + ld][U--fB] = J A (ld- 7 B) , (20) 

so that xt-zt- Z\j given by ( |15| ) (resp. Q in the optimization case) follows exactly the 
iterations of the relaxed forward-backward algorithm |17| Section 6] , and its convergence 
properties under assumptions |(A0)| and [(A2)| 
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This comparison is of particular interest in the convex optimization case since it may 
be inspiring to study the convergence rate of the generalized forward-backward on the 
objective. Indeed, it is now known that the exact forward-backward algorithm enjoys a 
convergence rate in 0(l/t) on the objective |55l 17], Furthermore, there has been several 
accelerated multistep versions of the exact forward-backward in the literature |55[ [6l |7D] 
with a convergence rate of 0(l/t 2 ) on the objective (although no convergence guarantee 
on the iterate itself is given). Therefore, two possible perspectives of this work would 
be to investigate the convergence rate (on the objective of course) of the generalized 
forward-backward and to design a potential multistep acceleration. 



Relaxed Douglas-Rachford If we set B = 0, the operator ([8| becomes 

1, 



-[R^aRns +Id] . 



(21) 



Taking jt = 7 e ]0, +oo[, Vi, Zt provided by ( 15 ) (resp. Q in the optimization case) would 
be equivalent to applying the relaxed Douglas-Rachford algorithm on the product space 
H for solving 6 [65, 21]. The convergence statements of Theorem 4.1 (I)|(iii) 

holds in this case under (AO)f( iJ X t e]0, 2[ with XteiN Af(2 -At ) = +oo and £teiN At(||ei t t| + 



l £ 2.t||) < +°°; see Remark 4.1 where a - ^ by Proposition 



4.3 



Resolvents of the sum of monotone operators The generalized forward-backward 
algorithm provides yet another way for computing the resolvent of the sum of maximal 
monotone operators at a point y e ran(Id+ £j Ai). It is sufficient to take in ^ Bx - x-y 
and (3-1. It would be interesting to compare this algorithm with the Douglas-Rachford 
and Dykstra-based variants |18| . This will be left to a future work. 



5.2 Relation to other work 

Relation to [53J In a finite-dimensional setting, these authors propose an algorithm 
for the monotone inclusion problem consisting of the sum of a continuous monotone 
map and a set-valued maximal monotone operator, introducing a "block-decomposition" 
hybrid proximal extragradient (HPE).They also derive the corresponding convergence 
rates. 

More precisely, our optimization problem can be rewritten in the form considered 
in \53\ Section 5.3, (51)]. Indeed, ([T| is equivalent to the linearly constrained convex 
problem 

min F (z^iZi) + Y l G i( z i) such that proj^i (z) = , (22) 

As proj^i is self-adjoint, z is an optimal solution if and only if there exists v = {vi)i e 
such that 

Oe (V F (z^iZi)) l + (dGiiz^/w^t + pioi s i(v) and proj^i (z) = , 

and the minimizer is given by x = Y.i^i z %- 

Let ? e]0, 1] and 7 = q — = = . Transposed to our setting, their iterations read: 
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Algorithm 2 Iterations Block-Decomposition HPE [53J. 

for i e [1, n] do 

Zi <r- prox iG . (j 2 x + (l -J 2 )zi -jWF(x) + r y(v i -u)); 

for % e [1, n] do 

L Vi-t-Vi- + 72; 

X Yi^iZu 



The update of the ZiS in this iteration shares similarities with the one in Algorithm [T] 
where 7 is identified with 7^. Nonetheless, the two algorithms are different in some 
important ways. Our algorithm is robust to errors while there is no proof of such robust- 
ness for HPE. Furthermore, HPE carries additional (dual) variables hence increasing the 
computational load of the algorithm. Finally, unlike our algorithm, the step-size in HPE 
7 cannot be iteration-varying, and 7 < <r whatever the Lipschitz constant of VF, which 
is a stronger condition than ours. The latter can have important practical impact. 

Relation to [23J While this paper was being released, these authors independently 
developed another algorithm to solve a class of problems that covers (J6|. They rely 
on the classical Kuhn- Tucker theory and propose a primal-dual splitting algorithm for 
solving monotone inclusions involving a mixture of sums, linear compositions, and paral- 
lel sums (inf-convolution in convex optimization) of set-valued and Lipschitz operators. 
More precisely, the authors exploit the fact that the primal and dual problems have 
a similar structure, cast the problem as finding a zero of the sum of a Lipschitz con- 
tinuous monotone map with a maximal monotone operator whose resolvent is easily 
computable. They solve the corresponding monotone inclusion using an inexact version 
of Tseng's forward-backward-forward splitting algorithm [69j. 

Removing the parallel sum and taking the linear operators as the identity in |23[ 
(1.1)], one recovers problem ([6]). For the sake of simplicity and space saving we do not 
reproduce here in full their algorithm. However, adapted to the optimization problem 
mm x F(x) + Y,iGi(Lix), where each Li is a bounded linear operator, their algorithm 
reads (Gi* is the Legendre-Fenchel conjugate of Gi): 

Algorithm 3 Iterations Primal-Dual Combettes-Pesquet (23|. 
y^x- lt {VF{x) + YliU*v i ) 
for i 6 [1, n] do 

Zi <-Vi + 7 t Lix; 
_ Vi<-Vi- Zi + prox 7tG > (z^ + jtLiy; 

x+-x- lt (vF (y) + EiLi L*( prox 7t(V (%))); 



Recall that the proximity operator of Gi* can be easily deduced from that of Gi using 
Moreau's identity. Taking Li = Id in Algorithm [3] solves ([T]). Similarly to the the 
generalized forward-backward, this algorithm allows for inexact computations of the 
involved operators and for varying step-size 74. However, if t = l/f3 denotes the Lipschitz 
constant of F, the bound on our step-size sequence is 2/£ while theirs is l/(£ + \/n), at 
least twice lower and degrading as n increases. While we solve the primal problem, their 
algorithm solves both the primal and dual ones, which at least doubles the number of 
auxiliary variables required. Moreover, it also requires two calls to the gradient of F per 
iteration. Nonetheless, their algorithm is able to solve a more general class of problems. 



16 



Finally, let us notice that if one want to use the composition with linear operators, 
each iteration requires two calls to each one of them and two calls to their adjoints, what 
can be computationally more expensive than computing directly the proximity operators 



It is also noteworthy to point out that Tseng's forward-backward-forward algorithm 
they used is a special case of the HPE method whose iteration complexity results were 
derived in |52) . 

6 Numerical experiments 

This section applies the generalized forward-backward to image processing problems. 
The problems are selected so that other splitting algorithms can be applied as well and 
compared fairly. In the following, Id denotes the identity operator on the appropriate 
space to be understood from the context, N is a positive integer and I = M, NxN is the 
set of images of size N x N pixels. 

6.1 Variational Image Restoration 

We consider a class of inverse problem regularizations, where one wants to recover 
an (unknown) high resolution image yo € X from noisy low resolution observations y = 
$yo + w € I. We report results using several ill-posed linear operators : I -> X, and 
focus our attention to convolution and masking operator, and a combination of these 
operators. In the numerical experiments, the noise vector w e X is a realization of an 
additive white Gaussian noise of variancecr^. 

The restored image yb = Wx is obtained by optimizing the coefficients x e H in 
a redundant wavelet frame |49j . where W ■ H -*■ X is the wavelet synthesis operator. 
The wavelet atoms are normalized so that W is a Parseval tight frame, i.e. it satisfies 
WW* = Id. In this setting, the coefficients are vectors x € % = X J where the redundancy 
J = 3 Jo + 1 depends on the number of scales Jo of the wavelet transform. 

The general variational problem for the recovery reads 



The first term in the summand is the data-fidelity term, which is taken to be a squared 
^2-norm to reflect the additive white Gaussianity of the noise. The second and third 
terms are regularizations, enforcing priors assumed to be satisfied by the original image. 
The first regularization is a l\\ii-norm by blocks, inducing structured sparsity on the 
solution. The second regularization is a discrete total variation semi-norm, inducing 
sparsity on the gradient of the restored image. The scalars \i and v are weights - so- 
called regularization parameters - to balance between each terms of the energy ^. We 
now detail the properties of each of these three terms. 

6.1.1 Data-Fidelity <$>Wxf 

For the inpainting inverse problem, one considers a masking operator 



Where ft is a set of pixels, taking into account missing or defective sensors that deteriorate 
the observations; we will denote p = \Q\/N 2 the ratio of missing pixels. For the deblurring 




min{^(x) = f -\\y - $Wxf + p\\x\\f 2 + 4Wx\\ T v} . 

x^H 2 



(23) 
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inverse problem, we consider a convolution with a discrete Gaussian filter of width a, 
K ■ y >-*• g a * y, normalized to a unit mass. This simulates a defocus effect or low- 
resolution sensors. In the following, we thus consider <I> being equal either to M, K or 
the composition MK. 

Denoting L d = f the fidelity term thus reads F(x) = ^\\y - Lx\\ 2 . The function F 
corresponds to the smooth term in (fTl). Its gradient S/F : x L* (Lx - y) is Lipschitz- 
continuous with constant < ||$W| = 1. 

For any 7 > 0, the proximity operator of F reads 

prox 7F (x) = (Id + 1 L*L)' 1 (x + 7 L*y) . (24) 

The vector L*y can be precomputed, but inverting Id+7L*L may be problematic. For 
L = Id, this is trivial. For inpainting or deblurring alone, as <E> is associated to a Parseval 
tight frame, L = MW or L = KW, the Sherman-Morrison- Woodbury formula gives 

(H+7LU)" 1 =Id-L*(Id+7LL*) _1 L 

= Id-W*<I»*(Id+7$<I>*)~ 1 <lW . (25) 



Since M (resp. K) is a diagonal operator in the pixel domain (resp. Fourier domain), ( |25[ ) 
can be computed in 0(N 2 ) (resp. 0(N 2 log N)) operations. However, the composite 
case L = MKW is more involved. An auxiliary variable is required, replacing F ■ % -*■ R 
by F : % x X -»•]— 00, +00] defined by 

F(x,u)= -\\y - Muf + l Ckw (x,u) = Gi(x,u) + G 2 (x,u) , (26) 
where Ckw d = {{ x , u)eHxI\u = KWx}. Only then, prox 7Gl can be computed from 



(24), and prox 7Ga is the orthogonal projection on 



ker([Id, -ii'W]) [29, 10J, which involves a similar inversion as in (25). 



6.1.2 Regularization /tx|x|f 2 

Sparsity-promoting regularizations over wavelet (and beyond) coefficients are popular 



to solve a wide range of inverse problems |49j . Figure 1(a) left, shows an example of 
orthogonal wavelet coefficients of a natural image, where most of the coefficients have 
small amplitude, they are thus quite sparse. A way to enforce this sparsity is to use the 
^i-norm of the coefficients ||x||i = Epl^pl- 

The presence of edges or textures creates structured local dependencies in the wavelet 
coefficients of natural images. A way to take into account those dependencies is to replace 
the absolute value of the coefficients in the ^i-norm by the ^2-norm of groups (or blocks) 
of coefficients [7T]. This is known as the mixed £i/^2-norm by 



llB 



\\ X \\T,2 = E MbWI = E Mb. /£ x l , (27) 

heB be/3 y P eh 

where p indexes the coefficients, the blocks b are sets of indexes, the block- structure B 
is a collection of blocks and Xb d = f ( x p) p(i \ 3 ls a subvector of x. The positive scalars /Ub 
are weights tuning the influence of each block. It is a norm on T-L as soon as B covers 
the whole space, i.e. V p e \l,Nf x [l, J], 3b e B s.t. p 6 B. Note that for B = U P {p} and 
fi{ p y = 1 for all p, it reduces to the ^i-norm. 

We mentionned in the introduction that the proximal operator of a ^i-norm is a 
soft-thresholding on the coefficients. Similarly, it is easy to show that whenever B is 
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non-overlapping, i.e. V b, b' e B, bnb' = 
thresholding by block 



the proximity operator of 



|| j 3 2 is a soft- 



prox Mi-i? 2 ( ( Xb )b ) = (€WOb)), 



with 



Q^(sb) = 



i 1 ra) 



if ||x b | < /U , 
otherwise , 



and the coefficients x p not covered by B remaining unaltered. 

Non-overlapping block structures break the translation invariance that is underlying 
most traditional image models. To restore this invariance, one can consider overlapping 
blocks, as illustrated in Figure 1(c) Computing proX|.|8 in this case is not as simple as 
for the non-overlapping case, because the blocks cannot be treated separately. For tree- 
structured blocks (i.e. bnb'*0=>bcb'orb'cb), |43j proposes a method involving 
the computation of a min-cost flow. This could be computationally expensive and do not 
address the general case anyway. Instead, it is always possible to decompose the block 
structure as a finite union of non-overlapping sub-structures B = Ui^. The resulting 

EbeB \\xb\\ = T.i T,beB t \\ x b\\ = Ei Hf* 2 , where each 



term can finally be split into 
|| • ||^ 2 is simple. 
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Figure 1: Illustration of the block ^i/^-norm. (a) sparsity of the image in an orthogonal 
wavelet decomposition (gray pixels corresponds to low coefficients) ; |(b)| a non-overlapping 
block structure; |(c)|splitting of a more complex structure into two non-overlapping layers. 



In our numerical experiments where H=X J , coefficients within each resolution level 
(from 1 to J) and each subband are grouped according to all possible square spatial 
blocks of size S x S; which can be decomposed into S 2 non-overlapping block structures. 



6.1.3 Regularization ^||1Fx||tv 

The second regularization favors piecewise-smooth images, by inducing sparsity on 
its gradient [63j. The total variation semi-norm can be viewed as a specific instance of 
h/h-norm, ||y|| T v = ||Vxy||?! v , with 



1 



X 2 

(V *y,H *y) 



and ||0a)||5 v = £ \/ 

P 6[l,iV]2 



Dp 2 + hp , 



where the image gradient is computed by finite differences through convolution with a 
vertical filter V and a horizontal filter H, and B^y is clearly non-overlapping. For some 
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special gradient filters, the modified TV semi-norm can be splitted into simple functions, 
see for instance |21[ 161] , However, we consider more conventional filters 



centered in the upper-left corner. Introducing an auxiliary variable as advocated in 



(26), the main difficulty remains to invert the operator (Id+7VxVx*). Under appro- 
priate boundary conditions, this can be done in the Fourier domain in 0(N 2 log(iV)) 
operations. 

6.2 Resolution with Splitting Methods 
6.2.1 Tested Algorithms 

We now give the details of the different splitting strategies required to apply the 



three tested algorithms to ( 23 ) . 



Generalized Forward-Backward (GFB) The problem is rewritten under the form 
as 

1 



S 2 



mm-\\y - MKWxf + M£Nlf 2 + u\\u\\f^ + tc VxW (x,u) , (28) 

u<lT z 

with F(x) = - MKWx\\ 2 and n = S 2 + 2. The indicator function ic VxW is defined 
similarly as in (26). In Algorithm [T] we set equal weights u){ = a constant gradient 



step-size 7 = 1.8/3 and a constant relaxation parameter to A = 1. 
Relaxed Douglas-Rachford (DR) Here the problem is split as 



s- 



mm h\y - Muxf + l Ckw (x,ui) +^^Nlf,2 + ^ll^llf^ + ^c VxW (x,u 2 ) 

X€ti Z -_i 



and solved with Algorithm [l] where FeO and n = S 2 + A. As mentioned in Section [5J this 
corresponds to a relaxed version of the Douglas-Rachford algorithm, with best results 
when 7 e 1/n. 

Primal-Dual Chambolle-Pock (ChPo) A way to avoid operator inversions is to 
rewrite the original problem as 



where 
and 

G : 



A: 



minG(Ax) 



% — * ix (nf xi 2 

X H- 



(MKWx,x,...,x, ViWx) 

ix(n) s2 xi 2 r 



(ui,x 1 ,...,x S 2,g) 1 — ► ^||y-ui|| 2 + ^X;f=il|a;i||f i 2 + ^ll5'llf,2 V 



The operator A is a concatenation of linear operators and its adjoint is easy to compute, 
and G is simple, being a separable mixture of simple functions. Note that this is not the 
only splitting possible. For instance, one can write the problem on a product space as 
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min is(( x i)i) + Y,i Gi(AiXi), where Gi is each of the functions in G above, and Aj is 

each of the linear operators in A. 

To solve this, we here use the primal-dual relaxed Arrow-Hurwicz algorithm described 

in |12j . According to the notations in that paper, we set the parameters a = 1, t = 
Q-9 flnr i = i 

<t(1+S 2 +8) allCl - L - 



Block-Decomposition Hybrid Proximal Extragradient (HPE) We split the prob- 
lem written in (28) according to (22), and set equals weights W{ = 1/n. According to 
Section 5.2 we set the parameter ? = 0.9. 



Primal-Dual Combettes-Pesquet (CoPe) Finally, the problem takes its simplest 
form 

1 52 
fy-MKWxf + nY, H% + HIVzWsDJT ■ ( 29 ) 

i=l 



mm- , 

xeH2 



As long as v = (no TV-regularization) , this is exactly ( 28 ) ; we apply Algorithm [3] 
where Li = Id for all i and 7 = 0.9/(1 + S). However with TV-regularization, we avoid the 
introduction of the auxiliary variable u with L S 2 + i = VxW and 7 = 0.9/(1 + V S 2 + 8). 



6.2.2 Results 

All experiments were performed on a discrete image of width N = 256, with values in 
the range [0, 1]. The additive white Gaussian noise has standard-deviation a w = 2.5-10 -2 . 
The reconstruction operator W uses non-separable, bi-dimensional Daubechies wavelets 
with 2 vanishing moments. It is implemented such that each atom has norm 2 - - 7 , with 
j e [1, JqJ and where Jo is the coarsest resolution level. Accordingly, we set the weights 
//b m the li/i^-norm to 2~ 3 at the resolution level j of the coefficients in block b. We 
use Jo = 4, resulting in a dictionary with redundancy J = 3Jo + 1 = 13. All algorithms 
are implemented in MATLA^] 

Results are presented in Figures [2] [3j [4] and [5] For each problem, the five algorithms 
were run 1000 iterations (initialized at zero), while monitoring their objective functional 
values ^ along iterations, ^min is fixed as the minimum value reached over the five 
algorithms (in our experiments, this was always the generalized forward-backward), and 
evolution of the objectives compared to ^min is displayed for the first 100 iterations. 
Because the computational complexity of an iteration may vary between algorithms, 
computation times for 100 iterations (no parallel implementation) are given beside the 
curves. Below the energy decay graph, one can find from left to right the original image, 
the degraded image and the restored image after 100 iterations of generalized forward- 
backward. Degraded and restored images quality are given in term of the signal-to-noise 
ratio (SNR). 



Comparison to algorithms that do not use the (gradient) explicit step (ChPo, 

DR) For the first three experiments, there is no total variation regularization. In the 
deblurring task (Figure [2|, blocks of size 2x2 are used. GFB is slightly better than 
the others and iteration cost of ChPo is too high for this problem. When increasing 
the number of block structures (inpainting, Figure [3j size 4x4) computation times 
tends to be similar but GFB clearly outperforms the others for the task. However, one 
advantage of using the gradient becomes obvious in the composite case {i.e. <I> = MK): 



An implementation of the generalized forward-backward, as well as the codes and materials for the 



experiments, are available at http://www.ceremade.dauphine.fr/~raguet/ 
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in Figure[4| DR performs hardly better than ChPo. Indeed, in contrast to previous cases 



(see Section 6.1.1 ), F is not simple anymore and the introduction of the auxiliary variable 
decreases the efficiency of each iteration of DR. This phenomenon is further illustrated in 
the last case, where the total variation is added, introducing another auxiliary variable. 

Comparison to algorithms that use the (gradient) explicit step (HPE, CoPe) 

In the first experiment where n is small, the iterations of HPE and CoPe are almost as 
efficient as the iterations of GFB but take more time to compute, especially for CoPe 
that needs twice more calls to VF. In the second setting, HPE and CoPe are hardly 
better than DR, maybe suffering from small gradient step-sizes. They perform better in 
the composite setting, but require more computional time than GFB. In the last setting, 
iterations of CoPe are still not as efficient as iterations of GFB in spite of their higher 



computational load due to the composition by the linear operator \7zW (see ( 29 ) ) 



Finally, let us note that in the composite case (i.e. = MK), the SNR of the 
restored image is greater when using both regularizations rather than one or the other 
separately. Moreover, we observed that it yields restorations more robust to variations of 
the parameters fi and v. Those arguments seem to be in favor of mixed regularizations. 



7 Conclusion 

We have introduced in this paper a novel proximal splitting method able to handle 
convex functionals that are the sum of a smooth term and several simple functions. 
It generalizes existing schemes by enlarging the class of problems that can be solved 
efficiently with proximal methods to the case where one of the function is smooth but 
not simple. We provided theoretical guarantees on the convergence and robustness of the 
algorithm even for the more general problem of finding the zeros of the sum of maximal 
monotone operators, one of which is also co-coercive. Numerical experiments on convex 
optimization problems encountered in inverse problems show evidence of the advantages 
of our approach for large-scale imaging problems. 

In analogy with first-order methods such as the forward-backward algorithm, estab- 
lishing convergence rates (on the objective) and designing multistep accelerations are 
possible perspectives that we leave to a future work. 
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(a) log(\& - ^min) vs. iteration # 



(b) computing time 



(c) LaBoute yo 



ChPo 

DR 

HPE 

CoPe 

GFB 



^ChPo 

^DR 

^HPE 

^CoPe 

^GFB 



= 153 S 
= 95 S 

= 148 s 
= 235 s 
= 73 s 




(d) y = Ky + w, 19.63 dB 



(e) y = Wx, 22.45 dB 



Figure 2: Deblurring: a = 2; fjt = 1.3 ■ 10~ 3 ; S = 2; v = 0. 



(a) logf^ - ^min) vs. iteration # 



(b) computing time 



^ChPo 

^DR 

^HPE 

^CoPe 

^GFB 



= 229 S 
= 219 S 

= 352 s 
= 340 s 
= 203 s 




(c) LaBoute yo 



(d) y = Myo + w, 1.54 dB 



(e) y = Wx, 21.66 dB 



Figure 3: Inpainting: p = 0.7; \x = 2.6 ■ 10 3 ; S = 4; v = 0. 
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(a) log(^ - ^ m i n ) vs. iteration # (b) computing time 




= 313 s 
= 256 s 
= 342 s 
= 268 s 
= 233 s 



20 40 60 80 100 




(c) LaBoute y (d) y = MKy + w, 3.93 dB (e) y = Wx, 20.77 dB 



Figure 4: Composite: a = 2; p = 0.4; p = 1.0 ■ 10~ 3 ; S = 4; 1/ = 0. 



(a) log(^ - ^min) vs. iteration # (b) computing time 




(c) LaBoute y (d) y = MKyo + w, 3.93 dB (e) j/b = Wx, 22.48 dB 



Figure 5: Composite: a = 2; p = 0.4; /x = 5.0 ■ 10~ 4 ; S = 4; v = 5.0 ■ 10" 3 . 
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